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Many networks of interest in the sciences, including a variety of social and biological networks, are 
found to divide naturally into communities or modules. The problem of detecting and characterizing 
this community structure has attracted considerable recent attention. One of the most sensitive 
detection methods is optimization of the quality function known as "modularity" over the possible 
divisions of a network, but direct application of this method using, for instance, simulated annealing 
is computationally costly. Here we show that the modularity can be reformulated in terms of the 
eigenvectors of a new characteristic matrix for the network, which we call the modularity matrix, and 
that this reformulation leads to a spectral algorithm for community detection that returns results 
of better quality than competing methods in noticeably shorter running times. We demonstrate the 
algorithm with applications to several network data sets. 



Introduction 

Many systems of scientific interest can be represented 
as networks — sets of nodes or vertices joined in pairs by 
lines or edges. Examples include the Internet and the 
worldwide web, metabolic networks, food webs, neural 
networks, communication and distribution networks, and 
social networks. The study of networked systems has a 
history stretching back several centuries, but it has expe- 
rienced a particular surge of interest in the last decade, 
especially in the mathematical sciences, partly as a result 
of the increasing availability of large-scale accurate data 
describing the topology of networks in the real world. 
Statistical analyses of these data have revealed some un- 
expected structural features, such as high network tran- 
sitivity Q, power-law degree distributions [1, and the 
existence of repeated local motifs Q; see 0, f° r 
reviews. 

One issue that has received a considerable amount of 
attention is the detection and characterization of com- 
munity structure in networks 0, @ , meaning the appear- 
ance of densely connected groups of vertices, with only 
sparser connections between groups (Fig. The abil- 
ity to detect such groups could be of significant practical 
importance. For instance, groups within the worldwide 
web might correspond to sets of web pages on related top- 
ics Q ; groups within social networks might correspond to 
social units or communities [Tot. Merely the finding that 
a network contains tightly-knit groups at all can convey 
useful information: if a metabolic network were divided 
into such groups, for instance, it could provide evidence 
for a modular view of the network's dynamics, with dif- 
ferent groups of nodes performing different functions with 
some degree of independence [ll|, [T^] ■ 

Past work on methods for discovering groups in net- 
works divides into two principal lines of research, both 
with long histories. The first, which goes by the name 
of graph partitioning, has been pursued particularly in 
computer science and related fields, with applications in 
parallel computing and VLSI design, among other ar- 
eas ^3 01- The second, identified by names such as block 




FIG. 1: The vertices in many networks fall naturally into 
groups or communities, sets of vertices (shaded) within which 
there are many edges, with only a smaller number of edges 
between vertices of different groups. 



modeling, hierarchical clustering, or community structure 
detection, has been pursued by sociologists and more re- 
cently also by physicists and applied mathematicians, 
with applications especially to social and biological net- 
works 0, in m . 

It is tempting to suggest that these two lines of re- 
search are really addressing the same question, albeit by 
somewhat different means. There are, however, impor- 
tant differences between the goals of the two camps that 
make quite different technical approaches desirable. A 
typical problem in graph partitioning is the division of a 
set of tasks between the processors of a parallel computer 
so as to minimize the necessary amount of interprocessor 
communication. In such an application the number of 
processors is usually known in advance and at least an 
approximate figure for the number of tasks that each pro- 
cessor can handle. Thus we know the number and size of 
the groups into which the network is to be split. Also, the 
goal is usually to find the best division of the network re- 
gardless of whether a good division even exists — there is 
little point in an algorithm or method that fails to divide 
the network in some cases. 

Community structure detection, by contrast, is per- 
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haps best thought of as a data analysis technique used 
to shed light on the structure of large-scale network 
datasets, such as social networks, Internet and web data, 
or biochemical networks. Community structure meth- 
ods normally assume that the network of interest divides 
naturally into subgroups and the experimenter's job is to 
find those groups. The number and size of the groups 
is thus determined by the network itself and not by the 
experimenter. Moreover, community structure methods 
may explicitly admit the possibility that no good division 
of the network exists, an outcome that is itself considered 
to be of interest for the light it sheds on the topology of 
the network. 

In this paper our focus is on community structure de- 
tection in network datasets representing real-world sys- 
tems of interest. However, both the similarities and 
differences between community structure methods and 
graph partitioning will motivate many of the develop- 
ments that follow. 



The method of optimal modularity 

Suppose then that we are given, or discover, the struc- 
ture of some network and that we wish to determine 
whether there exists any natural division of its vertices 
into nonoverlapping groups or communities, where these 
communities may be of any size. 

Let us approach this question in stages and focus ini- 
tially on the problem of whether any good division of the 
network exists into just two communities. Perhaps the 
most obvious way to tackle this problem is to look for 
divisions of the vertices into two groups so as to mini- 
mize the number of edges running between the groups. 
This "minimum cut" approach is the approach adopted, 
virtually without exception, in the algorithms studied in 
the graph partitioning literature. However, as discussed 
above, the community structure problem differs crucially 
from graph partitioning in that the sizes of the commu- 
nities are not normally known in advance. If community 
sizes are unconstrained then we are, for instance, at lib- 
erty to select the trivial division of the network that puts 
all the vertices in one of our two groups and none in 
the other, which guarantees we will have zero intergroup 
edges. This division is, in a sense, optimal, but clearly 
it does not tell us anything of any worth. We can, if we 
wish, artificially forbid this solution, but then a division 
that puts just one vertex in one group and the rest in the 
other will often be optimal, and so forth. 

The problem is that simply counting edges is not a 
good way to quantify the intuitive concept of commu- 
nity structure. A good division of a network into com- 
munities is not merely one in which there are few edges 
between communities; it is one in which there are fewer 
than expected edges between communities. If the num- 
ber of edges between two groups is only what one would 
expect on the basis of random chance, then few thought- 
ful observers would claim this constitutes evidence of 



meaningful community structure. On the other hand, if 
the number of edges between groups is significantly less 
than we expect by chance — or equivalently if the number 
within groups is significantly more — then it is reasonable 
to conclude that something interesting is going on. 

This idea, that true community structure in a network 
corresponds to a statistically surprising arrangement of 
edges, can be quantified using the measure known as 
modularity ■ The modularity is, up to a multiplicative 
constant, the number of edges falling within groups mi- 
nus the expected number in an equivalent network with 
edges placed at random. (A precise mathematical formu- 
lation is given below.) 

The modularity can be either positive or negative, with 
positive values indicating the possible presence of com- 
munity structure. Thus, one can search for community 
structure precisely by looking for the divisions of a net- 
work that have positive, and preferably large, values of 
the modularity jig . 

The evidence so far suggests that this is a highly 
effective way to tackle the problem. For instance, 
Guimera and Amaral 01 an d later Danon et al. @ op- 
timized modularity over possible partitions of computer- 
generated test networks using simulated annealing. In di- 
rect comparisons using standard measures, Danon et al. 
found that this method outperformed all other methods 
for community detection of which they were aware, in 
most cases by an impressive margin. On the basis of con- 
siderations such as these we consider maximization of the 
modularity to be perhaps the definitive current method 
of community detection, being at the same time based 
on sensible statistical principles and highly effective in 
practice. 

Unfortunately, optimization by simulated annealing is 
not a workable approach for the large network problems 
facing today's scientists, because it demands too much 
computational effort. A number of alternative heuris- 
tic methods have been investigated, such as greedy algo- 
rithms |18| and extremal optimization [l9l | . Here we take 
a different approach based on a reformulation of the mod- 
ularity in terms of the spectral properties of the network 
of interest. 

Suppose our network contains n vertices. For a par- 
ticular division of the network into two groups let Si = 1 
if vertex i belongs to group 1 and Sj = —1 if it belongs 
to group 2. And let the number of edges between ver- 
tices i and j be A^, which will normally be or 1, al- 
though larger values are possible in networks where mul- 
tiple edges are allowed. (The quantities Aij are the el- 
ements of the so-called adjacency matrix.) At the same 
time, the expected number of edges between vertices i 
and j if edges are placed at random is kikj/2m, where ki 
and kj are the degrees of the vertices and m = | ki 
is the total number of edges in the network. Thus the 
modularity can be written 
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where s is the vector whose elements are the Sj. The 
leading factor of l/4m is merely conventional: it is in- 
cluded for compatibility with the previous definition of 
modularity |l7|. 

We have here defined a new real symmetric matrix B 
with elements 



which we call the modularity matrix. Much of our at- 
tention in this paper will be devoted to the properties of 
this matrix. For the moment, note that the elements of 
each of its rows and columns sum to zero, so that it al- 
ways has an eigenvector (1,1,1,...) with eigenvalue zero. 
This observation is reminiscent of the matrix known as 
the graph Laplacian [2(j, which is the basis for one of 
the best-known methods of graph partitioning, spectral 
partitioning pH l22j |. and has the same property. And 
indeed, the methods presented in this paper have many 
similarities to spectral partitioning, although there are 
some crucial differences as well, as we will see. 

Given Eq. Q), we proceed by writing s as a linear 
combination of the normalized eigenvectors of B so 
that s = ciiUj with dj = u^" • s. Then we find 

n 

Q = ]>>ufB5> 3 u 3 =^(uf -s) 2 A, (3) 

i j i=l 

where (3i is the eigenvalue of B corresponding to eigen- 
vector Ui. Here, and henceforth, we drop the leading 
factor of 1 /Am for the sake of brevity. 

Let us assume that the eigenvalues are labeled in de- 
creasing order, (3\ > /?2 > • • • > Pn- We wish to maximize 
the modularity by choosing an appropriate division of the 
network, or equivalently by choosing the value of the in- 
dex vector s. This means choosing s so as to concentrate 
as much weight as possible in the terms of the sum in- 
volving the largest (most positive) eigenvalues. If there 
were no other constraints on our choice of s (apart from 
normalization), this would be an easy task: we would 
simply chose s proportional to the eigenvector ui. This 
places all of the weight in the term involving the largest 
eigenvalue j3\ , the other terms being automatically zero, 
since the eigenvectors are orthogonal. 

Unfortunately, there is another constraint on the prob- 
lem imposed by the restriction of the elements of s to the 
values ±1, which means s cannot normally be chosen 
proportional to ui. This makes the optimization prob- 
lem much more difficult. Indeed, it seems likely that 
the problem is NP-hard computationally, since it is for- 
mally equivalent to an instance of the NP-hard MAX- 
CUT problem. This makes it improbable that a simple 
procedure exists for finding the optimal s, so we turn 
instead to approximate methods. 

An optimization problem similar to this one appears 
in the development of the spectral partitioning method 
and in that case a very simple approximation is found 
to be effective, namely maximizing the term involving 



the leading eigenvalue and completely ignoring all the 
others. As we now show, the same approach turns out to 
be effective here too: we simply ignore the inconvenient 
fact that it is not possible to make s perfectly parallel to 
Ui and go ahead and maximize the term in (3\ anyway. 

Given that we are free to choose the sizes of our two 
groups of vertices, the greatest value of the coefficient 
(uf • s) 2 in this term is achieved by dividing the vertices 
according to the signs of the elements in Ui — all vertices 
whose corresponding elements in Ui are positive go in 
one group and all the rest in the other group. So this is 
our algorithm: we compute the leading eigenvector of the 
modularity matrix and divide the vertices into two groups 
according to the signs of the corresponding elements in 
this vector. 

We immediately notice some satisfying features of this 
method. First, as we have made clear, it works even 
though the sizes of the communities are not specified. 
Unlike conventional partitioning methods that minimize 
the number of between-group edges, there is no need to 
constrain the group sizes or to artificially forbid the triv- 
ial solution with all vertices in a single group. There is 
an eigenvector (1,1,1,...) corresponding to such a trivial 
solution, but its eigenvalue is zero. All other eigenvec- 
tors are orthogonal to this one and hence must possess 
both positive and negative elements. Thus, so long as 
there is any positive eigenvalue our method will not put 
all vertices in the same group. 

It is however possible for there to be no positive eigen- 
values of the modularity matrix. In this case the leading 
eigenvector is the vector (1, 1,1,.. .) corresponding to all 
vertices in a single group together. But this is precisely 
the correct result: the algorithm is in this case telling 
us that there is no division of the network that results 
in positive modularity, as can immediately be seen from 
Eq. since all terms in the sum will be zero or nega- 
tive. The modularity of the undivided network is zero, 
which is the best that can be achieved. This is an im- 
portant feature of our algorithm. The algorithm has the 
ability not only to divide networks effectively, but also to 
refuse to divide them when no good division exists. We 
will call the networks in this latter case indivisible. That 
is, a network is indivisible if the modularity matrix has 
no positive eigenvalues. This idea will play a crucial role 
in later developments. 

Our algorithm as we have described it makes use only 
of the signs of the elements of the leading eigenvector, 
but the magnitudes convey information too. Vertices 
corresponding to elements of large magnitude make large 
contributions to the modularity, Eq. J3J), and conversely 
for small ones. This means that moving a vertex cor- 
responding to an element of small magnitude from one 
group to the other makes little difference to Q. In other 
words, the magnitudes of the elements are a measure of 
how "strongly" a vertex belongs to one community or the 
other, and vertices with elements close to zero are, in a 
sense, on the borderline between communities. Thus our 
algorithm allows us not merely to divide the vertices into 
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FIG. 2: Application of our eigenvector-based method to the 
"karate club" network of Ref. l23ll . Shapes of vertices indi- 
cate the membership of the corresponding individuals in the 
two known factions of the network while the dotted line indi- 
cates the split found by the algorithm, which matches the fac- 
tions exactly. The shades of the vertices indicate the strength 
of their membership, as measured by the value of the corre- 
sponding element of the eigenvector. 



groups, but to place them on a continuous scale of "how 
much" they belong to one group or the other. 

As an example of this algorithm we show in Fig. [3 the 
result of its application to a famous network from the so- 
cial science literature, which has become something of a 
standard test for community detection algorithms. The 
network is the "karate club" network of Zachary [23| . 
which shows the pattern of friendships between the mem- 
bers of a karate club at a US university in the 1970s. 
This example is of particular interest because, shortly 
after the observation and construction of the network, 
the club in question split in two as a result of an inter- 
nal dispute. Applying our eigenvector-based algorithm to 
the network, we find the division indicated by the dotted 
line in the figure, which coincides exactly with the known 
division of the club in real life. 

The vertices in Fig. [5] are shaded according to the val- 
ues of the elements in the leading eigenvector of the mod- 
ularity matrix, and these values seem also to accord well 
with known social structure within the club. In partic- 
ular, the three vertices with the heaviest weights, either 
positive or negative (black and white vertices in the fig- 
ure) , correspond to the known ringleaders of the two fac- 
tions. 



Dividing networks into more than two communities 

In the preceding section we have given a simple matrix- 
based method for finding a good division of a network 
into two parts. Many networks, however, contain more 
than two communities, so we would like to extend our 
method to find good divisions of networks into larger 
numbers of parts. The standard approach to this prob- 
lem, and the one adopted here, is repeated division into 
two: we use the algorithm of the previous section first 



to divide the network into two parts, then divide those 
parts, and so forth. 

In doing this it is crucial to note that it is not correct, 
after first dividing a network in two, to simply delete the 
edges falling between the two parts and then apply the 
algorithm again to each subgraph. This is because the 
degrees appearing in the definition, Eq. of the mod- 
ularity will change if edges are deleted, and any subse- 
quent maximization of modularity would thus maximize 
the wrong quantity. Instead, the correct approach is to 
define for each subgraph g a new n g x n g modularity 
matrix B^ 9 ', where n g is the number of vertices in the 
subgraph. The correct definition of the element of this 
matrix for vertices i,j is 
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where k\ 9 ^ is the degree of vertex i within subgraph g and 
d g is the sum of the (total) degrees ki of the vertices in the 



subgraph. Then the subgraph modularity Q 6 



correctly gives the additional contribution to the total 
modularity made by the division of this subgraph. In 
particular, note that if the subgraph is undivided, Q g is 
correctly zero. Note also that for a complete network 
Eq. Q reduces to the previous definition for the modu- 
larity matrix, Eq. (J2J, since k[ — > ki and d g — > 2m in 
that case. 

In repeatedly subdividing our network, an important 
question we need to address is at what point to halt the 
subdivision process. A nice feature of our method is that 
it provides a clear answer to this question: if there exists 
no division of a subgraph that will increase the modular- 
ity of the network, or equivalently that gives a positive 
value for Q g , then there is nothing to be gained by divid- 
ing the subgraph and it should be left alone; it is indi- 
visible in the sense of the previous section. This happens 
when there are no positive eigenvalues to the matrix B*- 9 ' , 
and thus our leading eigenvalue provides a simple check 
for the termination of the subdivision process: if the lead- 
ing eigenvalue is zero, which is the smallest value it can 
take, then the subgraph is indivisible. 

Note, however, that while the absence of positive eigen- 
values is a sufficient condition for indivisibility, it is not 
a necessary one. In particular, if there are only small 
positive eigenvalues and large negative ones, the terms in 
Eq. for negative j3i may outweigh those for positive. It 
is straightforward to guard against this possibility, how- 
ever: we simply calculate the modularity contribution for 
each proposed split directly and confirm that it is greater 
than zero. 

Thus our algorithm is as follows. We construct the 
modularity matrix for our network and find its leading 
(most positive) eigenvalue and eigenvector. We divide 
the network into two parts according to the signs of the 
elements of this vector, and then repeat for each of the 
parts. If at any stage we find that the proposed split 
makes a zero or negative contribution to the total mod- 
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ularity, we leave the corresponding subgraph undivided. 
When the entire network has been decomposed into in- 
divisible subgraphs in this way, the algorithm ends. 

One immediate corollary of this approach is that all 
"communities" in the network are, by definition, indi- 
visible subgraphs. A number of authors have in the 
past prop osed formal definitions of what a community 
is [9I Il6l |24| . The present method provides an alter- 
native, first-principles definition of a community as an 
indivisible subgraph. 



Further techniques for modularity maximization 

In this section we describe briefly another method we 
have investigated for dividing networks in two by mod- 
ularity optimization, which is entirely different from our 
spectral method. Although not of especial interest on its 
own, this second method is, as we will shortly show, very 
effective when combined with the spectral method. 

Let us start with some initial division of our vertices 
into two groups: the most obvious choice is simply to 
place all vertices in one of the groups and no vertices in 
the other. Then we proceed as follows. We find among 
the vertices the one that, when moved to the other group, 
will give the biggest increase in the modularity of the 
complete network, or the smallest decrease if no increase 
is possible. We make such moves repeatedly, with the 
constraint that each vertex is moved only once. When 
all n vertices have been moved, we search the set of in- 
termediate states occupied by the network during the 
operation of the algorithm to find the state that has the 
greatest modularity. Starting again from this state, we 
repeat the entire process iteratively until no further im- 
provement in the modularity results. Those familiar with 
the literature on graph partitioning may find this algo- 
rithm reminiscent of the Kernighan-Lin algorithm [25| . 
and indeed the Kernighan-Lin algorithm provided the 
inspiration for our method. 

Despite its simplicity, we find that this method works 
moderately well. It is not competitive with the best pre- 
vious methods, but it gives respectable modularity val- 
ues in the trial applications we have made. However, 
the method really comes into its own when it is used in 
combination with the spectral method introduced ear- 
lier. It is a common approach in standard graph par- 
titioning problems to use spectral partitioning based on 
the graph Laplacian to give an initial broad division of a 
network into two parts, and then refine that division us- 
ing the Kernighan-Lin algorithm. For community struc- 
ture problems we find that the equivalent joint strategy 
works very well. Our spectral approach based on the 
leading eigenvector of the modularity matrix gives an ex- 
cellent guide to the general form that the communities 
should take and this general form can then be fine-tuned 
by our vertex moving method, to reach the best possible 
modularity value. The whole procedure is repeated to 
subdivide the network until every remaining subgraph is 



indivisible, and no further improvement in the modular- 
ity is possible. 

Typically, the fine-tuning stages of the algorithm add 
only a few percent to the final value of the modularity, 
but those few percent are enough to make the difference 
between a method that is merely good and one that is, 
as we will see, exceptional. 



Example applications 

In practice, the algorithm developed here gives excel- 
lent results. For a quantitative comparison between our 
algorithm and others we follow Duch and Arenas 0] 
and compare values of the modularity for a variety of 
networks drawn from the literature. Results are shown 
in Table [fl for six different networks — the exact same 
six as used by Duch and Arenas. We compare mod- 
ularity figures against three previously published algo- 
rithms: the betweenness-based algorithm of Girvan and 
Newman [Toj| . which is widely used and has been incor- 
porated into some of the more popular network analysis 
programs (denoted GN in the table); the fast algorithm 
of Clauset et al. |26| (CNM), which optimizes modularity 
using a greedy algorithm; and the extremal optimization 
algorithm of Duch and Arenas [0| (DA), which is ar- 
guably the best previously existing method, by standard 
measures, if one discounts methods impractical for large 
networks, such as exhaustive enumeration of all parti- 
tions or simulated annealing. 

The table reveals some interesting patterns. Our al- 
gorithm clearly outperforms the methods of Girvan and 
Newman and of Clauset et al. for all the networks in the 
task of optimizing the modularity. The extremal opti- 
mization method on the other hand is more competitive. 
For the smaller networks, up to around a thousand ver- 
tices, there is essentially no difference in performance be- 
tween our method and extremal optimization; the mod- 
ularity values for the divisions found by the two algo- 
rithms differ by no more than a few parts in a thousand 
for any given network. For larger networks, however, our 
algorithm does better than extremal optimization, and 
furthermore the gap widens as network size increases, 
to a maximum modularity difference of about a 6% for 
the largest network studied. For the very large networks 
that have been of particular interest in the last few years, 
therefore, it appears that our method for detecting com- 
munity structure may be the most effective of the meth- 
ods considered here. 

The modularity values given in Table ^ provide a use- 
ful quantitative measure of the success of our algorithm 
when applied to real-world problems. It is worthwhile, 
however, also to confirm that it returns sensible divisions 
of networks in practice. We have given one example 
demonstrating such a division in Fig. |2 We have also 
checked our method against many of the example net- 
works used in previous studies |ld Il7| . Here we give two 
more examples, both involving network representations 
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modularity 


Q 


network 


size n 


GN 


CNM DA 


this paper 


karate 


34 


0.401 


0.381 0.419 


0.419 


jazz musicians 


198 


0.405 


0.439 0.445 


0.442 


metabolic 


453 


0.403 


0.402 0.434 


0.435 


email 


1133 


0.532 


0.494 0.574 


0.572 


key signing 


10 680 


0.816 


0.733 0.846 


0.855 


physicists 


27519 




0.668 0.679 


0.723 



TABLE I: Comparison of modularities for the network divi- 
sions found by the algorithm described here and three other 
previously published methods as described in the text, for 
six networks of varying sizes. The networks are, in order, the 
karate club network of Zachary I23I , the network of collabora- 
tions between early jazz musicians of Gleiser and Danon [27| , 
a metabolic network for the nematode C. eleqans [2S|. a net- 
work of email contacts between students 1291 . a trust net- 
work of mutual signing of cryptography keys [30j , and a coau- 
thorship network of scientists working on condensed matter 
physics ;3l]- No modularity figure is given for the last network 
with the GN algorithm because the slow 0(n- 3 ) operation of 
the algorithm prevents its application to such large systems. 



of US politics. 

The first example is a network of books on pol- 
itics, compiled by V. Krebs (unpublished, but see 
www. orgnet . com). In this network the vertices represent 
105 recent books on American politics bought from the 
on-line bookseller Amazon.com, and edges join pairs of 
books that are frequently purchased by the same buyer. 
Books were divided (by the present author) according to 
their stated or apparent political alignment — liberal or 
conservative — except for a small number of books that 
were explicitly bipartisan or centrist, or had no clear af- 
filiation. 

Figure |3] shows the result of feeding this network 
through our algorithm. The algorithm finds four com- 
munities of vertices, denoted by the dotted lines in the 
figure. As we can see, one of these communities consists 
almost entirely of liberal books and one almost entirely of 
conservative books. Most of the centrist books fall in the 
two remaining communities. Thus these books appear 
to form "communities" of copurchasing that align closely 
with political views, a result that encourages us to believe 
that our algorithm is capable of extracting meaningful 
results from raw network data. It is particularly inter- 
esting to note that the centrist books belong to their own 
communities and are not, in most cases, merely lumped 
in with the liberals or conservatives; this may indicate 
that political moderates form their own community of 
purchasing. 

For our second example, we consider a network of po- 
litical commentary web sites, also called "weblogs" or 
"blogs," compiled from on-line directories by Adamic and 
Glance [32^|. who also assigned a political alignment, con- 
servative or liberal, to each blog based on content. The 
1225 vertices in the network studied here correspond to 
the 1225 blogs in the largest component of Adamic and 
Glance's network, and undirected edges connect vertices 




FIG. 3: Krebs' network of books on American politics. Ver- 
tices represent books and edges join books frequently pur- 
chased by the same readers. Dotted lines divide the four 
communities found by our algorithm and shapes represent 
the political alignment of the books: circles (blue) are liberal, 
squares (red) are conservative, triangles (purple) are centrist 
or unaligned. 

if either of the corresponding blogs contained a hyperlink 
to the other on its front page. On feeding this network 
through our algorithm we discover that the network di- 
vides cleanly into conservative and liberal communities 
and, remarkably, the optimal modularity found is for a 
division into just two communities. One community has 
638 vertices of which 620 (97%) represent conservative 
blogs. The other has 587 vertices of which 548 (93%) 
represent liberal blogs. The algorithm found no division 
of either of these two groups that gives any positive con- 
tribution to the modularity; these groups are "indivisi- 
ble" in the sense defined in this paper. This behavior is 
unique in our experience among networks of this size and 
is perhaps a testament not only to the widely noted po- 
larization of the current political landscape in the United 
States but also to the strong cohesion of the two factions. 

Finally, we mention that as well as being accurate our 
method is also fast. It can be shown that the run- 
ning time of the algorithm scales with system size as 
0{n 2 logn) for the typical case of a sparse graph. This is 
considerably better than the 0(n 3 ) running time of the 
betweenness algorithm and slightly better than the 
0(n 2 log 2 71) of the extremal optimization algorithm [T^ |. 
It is not as good as the 0(nlog 2 n) for the greedy al- 
gorithm of 26], but our results are of far better quality 
than those for the greedy algorithm. In practice, running 
times are reasonable for networks up to about 100 000 
vertices with current computers. For the largest of the 
networks studied here, the collaboration network, which 
has about 27 000 vertices, the algorithm takes around 20 
minutes to run on a standard personal computer. 

Conclusions 

In this paper we have examined the problem of detect- 
ing community structure in networks, which we frame 
as an optimization task in which one searches for the 



7 



maximal value of the quantity known as modularity over 
possible divisions of a network. We have shown that 
this problem can be rewritten in terms of the eigenval- 
ues and eigenvectors of a matrix we call the modularity 
matrix, and by exploiting this transformation we have 
created a new computer algorithm for community de- 
tection that demonstrably outperforms the best previ- 
ous general-purpose algorithms in terms of both quality 
of results and speed of execution. We have applied our 
algorithm to a variety of real-world network data sets, 
including social and biological examples, showing it to 
give both intuitively reasonable divisions of networks and 
quantitatively better results as measured by the modu- 



larity. 
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